NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Metric for Measuring the Impact of Rare Paths on Program Coverage

St_Amour, Leo; Tilevich, Eli; Gulzar, Muhammad Ali (January 2025, Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2025))

Full Text Available
TraceFL: Interpretability-Driven Debugging in Federated Learning via Neuron Provenance

https://doi.org/10.1109/ICSE55347.2025.00128

Gill, Waris; Anwar, Ali; Gulzar, Muhammad Ali (April 2025, Proceedings of the IEEE/ACM 47th International Conference on Software Engineering)

Not AvailaIn Federated Learning, clients train models on local data and send updates to a central server, which aggregates them into a global model using a fusion algorithm. This collaborative yet privacy-preserving training comes at a cost. FL developers face significant challenges in attributing global model predictions to specific clients. Localizing responsible clients is a crucial step towards (a) excluding clients primarily responsible for incorrect predictions and (b) encouraging clients who contributed high-quality models to continue participating in the future. Existing ML debugging approaches are inherently inapplicable as they are designed for single-model, centralized training. We introduce TraceFL, a fine-grained neuron provenance capturing mechanism that identifies clients responsible for a global model's prediction by tracking the flow of information from individual clients to the global model. Since inference on different inputs activates a different set of neurons of the global model, TraceFL dynamically quantifies the significance of the global model's neurons in a given prediction, identifying the most crucial neurons in the global model. It then maps them to the corresponding neurons in every participating client to determine each client's contribution, ultimately localizing the responsible client. We evaluate TraceFL on six datasets, including two real-world medical imaging datasets and four neural networks, including advanced models such as GPT. TraceFL achieves 99% accuracy in localizing the responsible client in FL tasks spanning both image and text classification tasks. At a time when state-of-the-art ML debugging approaches are mostly domain-specific (e.g., image classification only), TraceFL is the first technique to enable highly accurate automated reasoning across a wide range of FL applications.ble
more » « less
Free, publicly-accessible full text available April 26, 2026
DeSQL: Interactive Debugging of SQL in Data-Intensive Scalable Computing

https://doi.org/10.1145/3643761

Haroon, Sabaat; Brown, Chris; Gulzar, Muhammad Ali (July 2024, Proceedings of the ACM on Software Engineering)

SQL is the most commonly used front-end language for data-intensive scalable computing (DISC) applications due to its broad presence in new and legacy workflows and shallow learning curve. However, DISC-backed SQL introduces several layers of abstraction that significantly reduce the visibility and transparency of workflows, making it challenging for developers to find and fix errors in a query. When a query returns incorrect outputs, it takes a non-trivial effort to comprehend every stage of the query execution and find the root cause among the input data and complex SQL query. We aim to bring the benefits of step-through interactive debugging to DISC-powered SQL with DeSQL. Due to the declarative nature of SQL, there are no ordered atomic statements to place a breakpoint to monitor the flow of data. DeSQL’s automated query decomposition breaks a SQL query into its constituent sub queries, offering natural locations for setting breakpoints and monitoring intermediate data. However, due to advanced query optimization and translation in DISC systems, a user query rarely matches the physical execution, making it challenging to associate subqueries with their intermediate data. DeSQL performs fine-grained taint analysis to dynamically map the subqueries to their intermediate data, while also recognizing subqueries removed by the optimizers. For such subqueries, DeSQL efficiently regenerates the intermediate data from a nearby subquery’s data. On the popular TPC-DC benchmark, DeSQL provides a complete debugging view in 13% less time than the original job time while incurring an average overhead of 10% in addition to retaining Apache Spark’s scalability. In a user study comprising 15 participants engaged in two debugging tasks, we find that participants utilizing DeSQL identify the root cause behind a wrong query output in 74% less time than the de-facto, manual debugging.
more » « less
Full Text Available
A Metric for Measuring the Impact of Rare Paths on Program Coverage

https://doi.org/10.1109/SANER64311.2025.00042

Amour, Leo St; Tilevich, Eli; Gulzar, Muhammad Ali (March 2025, 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER))

Fuzzing has become a popular technique for discovering bugs and vulnerabilities. To increase the probability of finding bugs, developers should apply fuzzers that maximize program coverage. Program coverage typically measures the percentage of program lines or branches a fuzzer executes. However, these metrics fail to communicate the value of hitting a particular line, branch, or path. Many bugs manifest only within non-trivial control flows. To improve software quality, fuzzing non-trivial program paths should be more important than fuzzing trivial ones. This paper introduces rare-path coverage (RP-Coverage), a novel program coverage metric that conveys the value of discovering an unlikely control flow path. We have developed a new technique for estimating the probability of taking an execution path, which relies on probabilistic logic programming to declaratively express the logic for constructing and analyzing a probabilistic control flow graph. Our evaluation indicates RP-Coverage's promise as a metric for measuring fuzzing efficacy. Specifically, we observe that defects along rare paths-intuitively-substantially impact the effectiveness of fuzzers. However, we argue that existing fuzzing metrics fall short when conveying this significance. We also observe that the value of uncovering an unlikely path is better reflected by increases in RP-Coverage than existing metrics. Specifically, the average coverage increases are up to 49.5%, 11.1 %, and 15.4 % for RP-Coverage, line coverage, and branch coverage, respectively. This finding indicates that RP-Coverage is more elastic, or sensitive, to path probabilities and thus capable of more effectively quantifying a fuzzer's ability to discover unlikely program paths. As such, RP-Coverage demonstrates promise as a program coverage metric that enhances fuzzer fitness measures when supplementing standard criteria.
more » « less
Free, publicly-accessible full text available March 4, 2026
Natural Symbolic Execution-Based Testing for Big Data Analytics

https://doi.org/10.1145/3660825

Wu, Yaoxuan; Humayun, Ahmad; Gulzar, Muhammad Ali; Kim, Miryung (July 2024, Proceedings of the ACM on Software Engineering)

Symbolic execution is an automated test input generation technique that models individual program paths as logical constraints. However, the realism of concrete test inputs generated by SMT solvers often comes into question. Existing symbolic execution tools only seek arbitrary solutions for given path constraints. These constraints do not incorporate the naturalness of inputs that observe statistical distributions, range constraints, or preferred string constants. This results in unnatural-looking inputs that fail to emulate real-world data. In this paper, we extend symbolic execution with consideration for incorporating naturalness. Our key insight is that users typically understand the semantics of program inputs, such as the distribution of height or possible values of zipcode, which can be leveraged to advance the ability of symbolic execution to produce natural test inputs. We instantiate this idea in NaturalSym, a symbolic execution-based test generation tool for data-intensive scalable computing (DISC) applications. NaturalSym generates natural-looking data that mimics real-world distributions by utilizing user-provided input semantics to drastically enhance the naturalness of inputs, while preserving strong bug-finding potential. On DISC applications and commercial big data test benchmarks, NaturalSym achieves a higher degree of realism —as evidenced by a perplexity score 35.1 points lower on median, and detects 1.29× injected faults compared to the state-of-the-art symbolic executor for DISC, BigTest. This is because BigTest draws inputs purely based on the satisfiability of path constraints constructed from branch predicates, while NaturalSym is able to draw natural concrete values based on user-specified semantics and prioritize using these values in input generation. Our empirical results demonstrate that NaturalSym finds injected faults 47.8× more than NaturalFuzz (a coverage-guided fuzzer) and 19.1× more than ChatGPT. Meanwhile, TestMiner (a mining-based approach) fails to detect any injected faults. NaturalSym is the first symbolic executor that combines the notion of input naturalness in symbolic path constraints during SMT-based input generation. We make our code available at https://github.com/UCLA-SEAL/NaturalSym.
more » « less
Full Text Available
FedDefender: Backdoor Attack Defense in Federated Learning

https://doi.org/10.1145/3617574.3617858

Gill, Waris; Anwar, Ali; Gulzar, Muhammad Ali (December 2023, ACM)

Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. FedDefender first applies differential testing on clients’ models using a synthetic input. Instead of comparing the output (predicted label), which is unavailable for synthetic input, FedDefender fingerprints the neuron activations of clients’ models to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10% without deteriorating the global model performance.
more » « less
Full Text Available
Co-dependence Aware Fuzzing for Dataflow-Based Big Data Analytics

https://doi.org/10.1145/3611643.3616298

Humayun, Ahmad; Kim, Miryung; Gulzar, Muhammad Ali (November 2023, Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering)

Full Text Available
Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking

https://doi.org/10.18653/v1/2024.findings-naacl.197

Kang, Hong Jin; Harel-Canada, Fabrice; Gulzar, Muhammad Ali; Peng, Nanyun; Kim, Miryung (January 2024, Association for Computational Linguistics)

Full Text Available
NaturalFuzz: Natural Input Generation for Big Data Analytics

https://doi.org/10.1109/ASE56229.2023.00034

Humayun, Ahmad; Wu, Yaoxuan; Kim, Miryung; Gulzar, Muhammad Ali (September 2023, IEEE)

Full Text Available
FedDebug: Systematic Debugging for Federated Learning Applications

https://doi.org/10.1109/ICSE48619.2023.00053

Gill, Waris; Anwar, Ali; Gulzar, Muhammad Ali (May 2023, IEEE/ACM 45th International Conference on Software Engineering)

In Federated Learning (FL), clients independently train local models and share them with a central aggregator to build a global model. Impermissibility to access clients' data and collaborative training make FL appealing for applications with data-privacy concerns, such as medical imaging. However, these FL characteristics pose unprecedented challenges for debugging. When a global model's performance deteriorates, identifying the responsible rounds and clients is a major pain point. Developers resort to trial-and-error debugging with subsets of clients, hoping to increase the global model's accuracy or let future FL rounds retune the model, which are time-consuming and costly. We design a systematic fault localization framework, Fedde-bug,that advances the FL debugging on two novel fronts. First, Feddebug enables interactive debugging of realtime collaborative training in FL by leveraging record and replay techniques to construct a simulation that mirrors live FL. Feddebug'sbreakpoint can help inspect an FL state (round, client, and global model) and move between rounds and clients' models seam-lessly, enabling a fine-grained step-by-step inspection. Second, Feddebug automatically identifies the client(s) responsible for lowering the global model's performance without any testing data and labels-both are essential for existing debugging techniques. Feddebug's strengths come from adapting differential testing in conjunction with neuron activations to determine the client(s) deviating from normal behavior. Feddebug achieves 100% accuracy in finding a single faulty client and 90.3% accuracy in finding multiple faulty clients. Feddebug's interactive de-bugging incurs 1.2% overhead during training, while it localizes a faulty client in only 2.1% of a round's training time. With FedDebug,we bring effective debugging practices to federated learning, improving the quality and productivity of FL application developers.
more » « less
Full Text Available

« Prev Next »

Search for: All records